[PT 2.6 perf fix] Remove expensive memory_full_info kernel call #40

lishunyao97 · 2025-07-21T23:20:26Z

Why are these changes needed?

[PyTorch 2.6 perf fix]
The ray monitoring process agent.py runs periodic memory stats collection which used an expensive low-level kernel call (ray callsite) caused the training process to stall. Removing the expensive kernel call fixed the regression on PyTorch 2.6.

Related issue number

The call was noticed to be expensive in 2019
Reduce reporter CPU by ericl · Pull Request #6553 · ray-project/ray

then un-noticed in 2022
[Core] Export additional metrics for workers and Raylet memory by mwtian · Pull Request #25418 · ray-project/ray

As a next step, we would like to contribute to Ray OSS by exposing allowed metrics as a config.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

ShaochenYu-YW

Thanks a lot for fixing it!

Remove expensive memory_full_info kernel call

14743a6

lishunyao97 changed the title ~~Remove expensive memory_full_info kernel call~~ [PT 2.6 perf fix] Remove expensive memory_full_info kernel call Jul 21, 2025

lishunyao97 requested review from ShaochenYu-YW and iamyangchen July 21, 2025 23:31

ShaochenYu-YW approved these changes Jul 21, 2025

View reviewed changes

ShaochenYu-YW merged commit e187cad into pinterest/main-2.10.0 Jul 21, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PT 2.6 perf fix] Remove expensive memory_full_info kernel call #40

[PT 2.6 perf fix] Remove expensive memory_full_info kernel call #40

Uh oh!

lishunyao97 commented Jul 21, 2025 •

edited

Loading

Uh oh!

ShaochenYu-YW left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PT 2.6 perf fix] Remove expensive memory_full_info kernel call #40

[PT 2.6 perf fix] Remove expensive memory_full_info kernel call #40

Uh oh!

Conversation

lishunyao97 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

ShaochenYu-YW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lishunyao97 commented Jul 21, 2025 •

edited

Loading